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BAYESIAN'  INFERENCE  CONCERNING  MANY  PARAMETERS. 
WITH  REFERENCE  TO  SUPERSATURATED  DESIGNS* 

F.  J.  Anscombe 
Princeton  University^ 

INTRODUCTION 

Real  scientific  judgments  ar  not  exactly  reflecied  in  the  theory  of  consistent 
behaviour  of  Ramsey,  de  F'inetti,  and  Savage,  since  in  the  theory  the  person  is 
postulated  to  have  infinitely  fine  perctptiveness.  When  the  theory  is  applied  to 
well-defined  statistical  problems,  the  initial  probabilities  of  parameters  and  the 
utility  function  postulated  in  the  calculations  are  derived  from  rather  imprecise 
perceptions  or  judgments.  Often,  the  final  results  are  insensitive  to  moderate 
changes  in  the  initial  probabilities  and  the  utility  function,  and  it  is  natural  to 
choose  mathematical  expressions  for  these  functions  that  roughly  represent  the 
intuitive  judgments  and  are  at  the  same  time  mathematically  convenient  to 
handle.  Mistakes  are  rather  easily  made,  however.  That  is,  one  may  postulate 
mathematical  forms  for  the  initial  probability  distribution  or  for  the  utility 
function,  under  the  impression  that  intuition  is  thereby  represented  fairly,  when 
in  fact  the  postulated  forms  have  implications  at  variance  with  intuition. 

It  is  quite  common  in  statistics  to  work  with  very  simple  loss  functions, 
notably  with  quadratic  loss  functions  and  also  with  two-valued  loss  functions. 
In  many  cases  these  are  good  enough.  But  a  quadratic  loss  function  is  not  so 
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Stalisiical  Mmhodolog'j  I 

good  when  associated  with  a  probability  distribution  having  an  iiianite  variance. 
The  trouble  here  is  obvious,  and  usually  there  will  be  no  hesitation  over  taking 
the  obvious  remedy,  of  changing  the  loss  function.  It  is  possible  also  for  there 
to  be  a  conflict  between  the  data  and  the  initial  distribution  of  parameters.  This 
may  not  be  altogether  obvious,  though  if  it  is  noticed  either  the  data  or  the 
initial  distribution  will  have  to  be  rejected  or  modified.  Suppose,  as  an  e.xtreme 
example,  that  a  person  proposes  to  take  readings  of  the  v/eight  n  of  an  object, 
ana  let  it  be  given  that  the  readings  will  differ  from  /i  by  a  chance  “error"  having 
a  normal  distribution  with  zero  .mean  and  1  g.  standard  deviation.  Before  the 
readings  ate  taken,  the  person  may  judge  that  n  will  lie  between  200  g.  and 
300  g.,  and  having  no  particular  opinion  as  to  where  p  will  come  in  this  range 
he  may  be  incautious  enough  to  postulate  a  uniform  distribution  for  the  initial 
probability  ov'er  just  this  interval,  with  zero  probability  outside  it.  Suppose 
now  that  he  takes  spine  readings,  and  finds  them  to  be  somewhere  near  500  g. 
If  after  careful  checking  he  accepts  the  readings  as  genuine  and  trustworthy, 
and  is  prepared  to  conclude  that,  contrary  to  prior  expectation,  n  is  indeed  in 
the  neighbourhood  of  500  g.,  then  his  asserted  initial  distribution  must  have 
been  wrong.  He  did  not  really  mean  to  impiy  absolute  conviction  that  n  lay 
between  200  g.  and  300  g.  If  he  had  had  such  a  conviction,  he  would  have  been 
forced  to  reject  the  observations  as  spurious,  or  else,  by  a  long  stretch  of  the 
imagination  amounting  to  lunacy,  to  hav'e  concluded  from  Bayes’s  theorem  that 
p  was  probably  very  close  to  300  g. 

It  is  customary  to  e.xpiess  the  total  probability  distribution  for  an  experiment, 
extending  over  the  Cartesian  product  of  a  “sample  space"  and  a  "parameter 
space,”  by  two  component  distributions,  (i)  the  conditional  distribution  over 
the  sample  space,  giv’cn  any  poin^  in  the  parameter  space  (this  is  often  called 
the  statistical  specification,  or  class  of  admissible  hypotheses,  or  “model"  for 
the  experiment),  and  (ii)  the  marginal  distribution  over  the  parameter  space, 
usually  referred  to  simply  as  the  initial  or  “prior"  distribution.  Usually,  the 
former  is  supposed  to  be  a  chance  di<;rribution  with  frequency  interpretation, 
whereas  the  latter  represents  subjective  opinion.  It  is  w’ell  known  that  the 
observations  are  capable  of  contradicting  the  first  type  of  distribution,  as  for 
example  when  the  distribution  is  asserted  to  be  normal  but  the  observations  are 
strikingly  discrepant  with  the  hypothesis  of  normality.  It  does  not  seem  to  be 
generally  realized  that  the  observations  can  just  as  clearly  contradict  the  second 
type  of  distribution. 

This  is  liable  to  happen,  in  particular,  if  there  are  many  parameters.  Suppose 
that  according  to  the  statistical  specification  the  observations  yt  {i  —  1,  2, .  .  .  ,  n) 
have  normal  independent  chance  distributions  with  known  common  variance 
and  mean  values  expressed  as  linear  functions  of  some  parameters.  It  is  tempting 
to  postulate  as  the  initial  distribution  for  the  parameters  some  particular  multi¬ 
variate  normal  distribution,  because  such  a  distribution  is  'conjugate"  to  the 
statistical  specification  (Raiffa  and  Schlaifer  [6]).  However,  by  a  (fully  known) 
non-singular  linear  transformation,  the  parameters  can  be  made  to  have  inde¬ 
pendent  normal  initial  distributions  with  zero  means  and  unit  variances;  and 
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now — if  the  number  of  parameters  is  large — by  the  law  of  large  numbers  we 
deduce  that  a  certain  linear  function  of  the  observations  has  high  probability 
of  being  close  to  zero,  and  that  a  certain  quadratic  function  of  the  observations 
has  high  probability  of  being  close  to  one,  and  similar  statements  about  higher 
moments.  But  these  particular  functions  of  the  observations  may  in  fact  yield 
values  that  are  far  from  the  predicted  values.  Perceiving  this  difficulty,  Duncan 
(3j,  and  possibly  also  Dunnett  [4],  have  hinted  that  the  initial  distribution  for 
the  parameters  should  not  completely  postulated  in  advance,  but  should  be 
permitted  to  have  one  or  more  adjustable  constants  in  it,  which  would  be  chosen, 
after  a  preli.minarj'  analysis  of  the  observations,  to  avoid  one  or  more  specific 
conflicts  of  the  above  sort.  Such  an  adjusted  distribution  cannot  be  said  to 
represent  prior  opinion  directly,  since  it  depends  on  the  observations.  Its  use 
may  possibly  be  justifiable,  but  we  may  reasonably  ask  to  see  the  justification. 

The  main  purpose  of  this  paper  is  to  illustrate  a  more  satisfying  way  of  assigning 
the  initial  distribution  for  the  parameters,  in  a  problem  where  there  are  many 
parameters.  We  consider  an  experiment  in  which  a  large  number  of  factors  are 
tested  or  “screened.  ’  It  is  expected  that  most  of  the  factors  will  have  small 
effects,  and  importance  attaches  to  identifying  any  factors  that  have  substantial 
effects.  We  introduce  an  initial  distribution  expressing  indifference  of  opinion 
concerning  the  factor  responses,  but  a  certain  expectation  as  to  the  relative 
magnitudes  of  the  responses  in  aggregate;  such  a  distribution  should  appear 
reasonable  (we  believe)  for  some  kinds  of  exploratory  experiments.  Conditions 
on  the  design  that  facilitate  the  ensuing  analysis  are  formulated.  Particular 
attention  is  paid  to  supersaturated  designs,  for  which  the  number  of  parameters 
e.xceeds  the  number  of  observations. 

Our  formulation  of  this  problem  is  based  on  unpublished  work  by  Beale  and 
Mallows  [1],  to  whom  is  due  the  idea  of  modifying  an  ordinary  least-squares 
analysis  in  the  light  of  a  suitably  chosen  prior  distribution.  On  the  subject  of 
screening  experiments  in  general,  including  supersaturated  experiments,  see 
Satterthwaite  [7]  and  the  associated  discussion.  Booth  and  Cox  [2]  have  recently 
considered  the  construction  of  supersaturated  designs,  under  a  condition  (not 
imposed  here)  that  each  factor  may  be  tested  at  only  two  levels.  The  considera¬ 
tions  of  this  paper  have  some  relation  to  Stein's  work  on  many-parameter 
estimation,  as  for  example  [8]  and  [9]. 

Tiao  and  Zellner  [10]  have  treated  essentially  the  same  problem,  in  a  paper 
that  came  to  hand  after  this  was  written. 


A  FACTOR-SCREENING  EXPERIMENT 


Specification  and  notation 

Let  yi,  yj, . . .  ,  y,  be  readings  obtained  in  a  factorial  experiment  in  which  / 
factors  are  varied.  Let  the  following  specification  be  postulated,  implying  that 
the  effects  of  the  factors  are  linear  and  additive: 


(1) 


(i  »  1,2 - -  n). 
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Here  the  coefficients  Xj,.  representing  the  levels  of  the  factors,  are  known  and 
freely  adjustable  in  the  planning  of  the  experiment;  the  mean  and  the  response 
coefficients  j8,  are  parameters  whose  values  are  not  certainly  known;  and  the 
errors  are  realizations  of  chance  variables  independently  and  identically  dis- 
tribi'ted  in  a  normal  distribution  with  mean  0  and  variance  a’. 

Let  X  denote  the  n  X  C/  +  1)  matrix  of  coefficients  of  the  parameters  n,  ^r- 
in  (1),  that  Is, 


(2) 


X=  /! 


1 


*11  *it 
*J1  *M 


3Cl/\ 

Xif 


W 


By  redefining  n  and  the  zeros  of  the  factor  levels  if  necessary,  we  can  make 
every  other  column  of  X  orthogonal  to  the  first  column.  Thus  we  shall  suppose 


(3) 


EiJCif  =  0 


(r=  1,2, 


./)• 


We  shall  also  suppose  that  X  has  the  greatest  possible  rank  for  its  size,  namely 
min(n,/  +  1).  We  shall  be  interested  in  large  values  for  n  and  /,  but  the  formal 
requirement  in  what  follows  is  merely  that  n  >  3,  /  >  1.  If  /  <  n  —  2,  we  say 
that  the  design  is  unsaturated;  the  parameters  u,  &t,  can  be  estimated  by  the 
methofJ  of  least  squares,  and  a-  can  be  estimated  from  the  residual  sum  of 
squares.  If  /  =  n  —  1,  the  design  is  saturated,  and  the  method  of  least  squares 
yields  estimates  of  m  and  0,  but  not  of  a’.  If/  >  n,  the  design  is  supersaturated, 
and  the  method  of  least  squares  yields  a  unique  estimate  for  u  but  not  a  unique 
estimate  for 

It  will  be  convenient  to  consider  a  reduced  form  for  all  but  the  first  column 
of  X.  Let  T  denote  an  n  X  n  orthogonal  matrix,  with  entries  iij,  such  that 
every  entry  in  the  first  row  is  equal  to  II Vn.  Then  T  transforms  the  observations 
(y,)  to  («<),  where 


(4) 

f 

and 

(5) 

=  S  ys/Vn  =EfVi 
/ 

Because  of  (3),  X  is  transformed  to 


(0) 

TX  ^  , 

(  Vn 

Oa 

0  ...  0 

0 

Wn 

Wii  .  .  .  Wif 

\ 

V  0 

Wn\ 

...  VJnf 

Let  \V  stand  for  the  (n  —  1)  X  /  matrix  of  coefficients  W(r  (i  =  2,  3,  .  .  .  ,  n; 
r  =  1,2,...,/).  For  some  purposes,  V/  is  a  more  convenient  description  of  the 
design  than  X. 
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Let  Cstand  for  the /X/ matrix  IV'W  (where  the  prime  denotes  the  transpose), 
having  entries 

(7)  Cr,  = 

In  s’-immations,  the  suffixes  r,  s,  will  be  understood  to  run  always  from  1  to  /, 
and  the  suffixes  i,  j,  to  run  from  1  to  n,  unless  the  suffix  is  enclosed  in  paren¬ 
theses,  as  above  in  (7),  in  which  case  it  will  be  understood  to  run  from  2  to  n. 

Purbose 

The  purpose  of  conducting  the  expK'riment  and  obs-Txaug  the  yt  is  to  make 
inferences  concerning  the  response  coefficients  j3,.  We  shall  express  the  results 
of  the  experiment  primarily  in  the  form  of  a  posterior  distribution  for  the  ^r-  We 
shall  also  consider  the  particular  problem  of  selecting  from  among  the  factors 
any  for  which  the  corresponding  response  coefficient  is  substantially  different 
from  zero.  This  problem  will  be  characterized  by  tne  following  loss  function. 

We  suppose  that,  associated  with  the  rth  factor,  there  is  a  given  number 
h„  serving  as  a  threshold  cf  importance,  sucii  that  if  |/3r|  were  known  to  exceed 
h,  we  should  prefer  to  classify  the  rth  factor  as  “interesting"  and  retain  it  for 
future  investigation,  whereas  if  l3,|  were  known  to  be  less  than  k,  we  shoulj 
prefer  to  give  the  verdict  "uninteresting"  and  discard  the  factor.  Let  the  utility 
loss  in  discarding  the  factor  be  supposed  equal  to  3,-,  and  the  cost  of  retaining 
the  factor  for  further  invt“stigation  be  supposed  independent  of  3r,  and  therefore 
equal  to  h,'.  Then  if  the  value  of  3r  is  not  known  certainly,  the  decision  to  retain 
the  factor  will  be  preferred  if 

(8)  E{3r'-)  >  hr\ 

and  the  decision  to  discard  will  be  preferred  if  the  inequality  is  reversed.  Here 
the  expectation  is  with  respect  to  the  available  marginal  probability  distribution 
for  3,- 

Thus  in  addition  to  determining  a  joint  posterior  distribution  for  all  the 
response  coefficients  /S,,  given  the  results  of  the  experiment,  we  shall  be  interested 
in  the  value  of  E{3r‘)  for  each  r,  with  respoct  to  this  posterior  distribution. 

Itiitial  prohahiliiy  distribution  for  the  parameters 
We  have  to  supply  a  joint  initial  distribution  for  the  parameters,  ^r^  and 
the  3r-  For  u  we  shall  postulate  the  uniiorm  distribution  over  the  whole  real 
line,  indepondent  of  the  distribution  for  the  other  parameters.  Our  procedure 
of  analysis  will  therefore  be  invariant  under  changes  in  the  origin  of  measure¬ 
ment  of  the  yt.  For  <t-  it  wall  be  simplest  to  consider  the  possibil’ty  that  o-  is 
certainly  known.  But  if  <t'  is  not  known  certainly  it  will  be  supposed  to  have 
the  following  initial-probability  element,  indepondent  of  the  distribution  of  the 
other  pjararneters; 

(9)  exp[-aV2v=]d((r-’), 

where  k  and  a  are  given  positive  numbers.  (There  is  no  need  to  insert  a  nor¬ 
malizing  factor,  since  it  will  cancel  out  when  we  use  Bayes’s  theorem.) 
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As  for  the  response  coefficients  we  wish  to  express  a  prior  belief  tliat  most 
of  them  will  be  small.  If  /  is  large,  no  joint  normal  distribution  for  the  0r  will 
do,  because  of  the  law-of-large-numhers  elTcct,  as  alre.ady  mentioned. 

Observe  first  that  if  all  the  entries  in  any  jolumn  of  W  (or  in  any  column 
of  X  other  than  the  first)  are  multiplied  by  a  non-zero  constant  and  the  corre¬ 
sponding  j8  di\'!ded  by  that  constant,  the  problem  is  essentially  unchanged.  Let 
us  suppose  the  columns  have  been  so  adjusted  that  the  initial  opinion  concerning 
the  magnitude  of  each  /3  is  the  same.  The  set  of  /3’s,  if  we  could  observe  them, 
would  look  like  a  sample  from  some  population.  Let  us  consider  the.  possibility 
of  postulating  that  the  population  would  be  normal  and  have  zero  mean. 

The  suggestion  that  initially  each  3  has  zero  expectation  and  that  the  joint 
distribution  of  the  /3’s  is  in\'ariant  under  permutations  can  be  given  a  certain 
sort  of  objective  validity  by  performing  two  acts  of  randomization;  (a)  each 
column  of  \V  (or  each  column  of  X  other  than  the  first),  together  with  the 
correspoirding  /S,  is  either  unchanged  or  multiplied  by  —1,  according  to  the  flip 
of  a  fair  coin,  choices  being  independent  for  different  columns;  (6)  the  /  factors 
arc  given  a  random  permutation  before  being  numbered  from  1  to  /. 

These  randomizations  do  no  harm,  and,  like  other  more  usual  randomizations, 
are  probably  advisable  as  a  prettetion  against  unconscious  biases  and  specifica¬ 
tion  errors  if  a  scries  of  similar  experiments  is  to  be  performed.  But  after  the 
results  of  the  randomizations  are  known,  the  fact  of  randomization  has  no 
bearing  on  the  plausibility  of  a  normal  population  of  /3’s  with  zero  mean.  The 
normality  of  the  distribution  is  open  to  empirical  refutation,  after  a  large  number 
of  /3’s  have  been  estimated,  just  as  the  assumption  that  the  e’s  have  a  normal 
distribution  is  open  to  empirical  refutation,  provided  there  is  enough  replication 
in  the  experiment.  Both  normality  assumptions  are  logically  alike  and  may  be 
regarded  as  belonging  to  the  “specification”  rather  than  to  the  “initial  distribu¬ 
tion”  part  of  the  total  assumed  probability  distribution  for  the  experiment.  VVe 
then  have  only  three  parameters  for  which  we  have  to  assert  initial  distributions, 
namely  n  and  a’,  as  already  discussed,  and  also  T^  the  variance  of  the  normal 
population  from  which  the  P's  arc  supposed  drawn.  For  ilie  latter  let  us  postulate 
the  initial-probability  element 

(10)  (r-*)‘/*-‘e.xp[--6V2r»]d(T-*), 

where  I  and  b  arc  given  positive  numbers.  The  mathematical  forms  for  (9)  and 

(10)  have,  of  course,  been  chosen  for  mathematical  convenience.  Each  distri¬ 
bution  may  be  adjusted  to  have  any  mean  and  any  variance. 

Thus  the  total  probability  element  for  the  experiment  is  propwrtiona!  to 

(11)  exp|^-  M  / 2(7*  -  53  /S,’/2t’ -  iV2T’j 

dy.U  dPrdnd{r-% 

<  r 

if  <7*  is  known  certainly.  If  (7*  is  not  known  certainly  but  has  the  initial  distri¬ 
bution  (9),  then  the  total  probability  element  is  proportional  to  (11)  multiplied 
by  the  further  factor 
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(12)  expI-aV2<r*]<i(<r-*). 

But  for  the  present  let  us  suppose  <r*  known. 

If  we  integrate  (11)  with  respect  to  r*,  we  obtain  the  marginal  distribution  for 
all  the  other  v'ariables.  The  probability  eiemenl  is  proportional  to  the  product 
of  the  following  two  factors,  which  we  separate  for  case  of  discussion: 

(13)  exp|^  -  ^  (yt~  -  Y,  j  2<t‘J  n 

and 

(14)  (^6*  +  Z  ?,')  n 

Thus  our  assumptions  about  the  /3„  that  they  are  a  sample  from  a  normal 
population  having  zero  mean  and  variance  t*,  where  r*  has  the  initial  distribu¬ 
tion  (10),  is  equivalent  to  postulating  (14)  as  the  joint  initial  distribution  for 
the  /3,.  The  distribution  (14)  has  the  following  properties.  The  marginal  distri¬ 
bution  for  any  one  response  coefficient,  say  /3i,  is  proportional  to 

(15)  (6* 

and  this  has  a  great  dispersion  if  I  is  small.  But  the  conditional  distribution  for 
/3i,  given  all  the  other  /S’s,  and  given,  in  particular,  that  /3i*  -f  /3j*  -f  . . .  /S/ 

=  pi*,  say,  is  proportional  to 

(16)  (5»  -f  pi>  + 

and  this  is  close  to  a  normal  distribution  with  zero  mean  and  variance 

(5»-hpxW+/-  1) 

if  /  is  large.  The.sc  properties  are  intuitively  satisfactory,  and  do  not  resemble 
t.hose  of  any  joint  normal  distribution. 

INFERENXE?  FROM  THE  E.XPERIMENT 

Generalities 

The  posterior  distribution  for  the  is  derived  (according  to  Bayes’s  theorem) 
from  the  total  probability  distribution  for  the  e.xpcriment,  (13)  and  (14),  by 
integrating  with  respect  to  the  unwanted  parameter  n  and  conditioning  on  the 
observed  values  of  the  yt.  The  factor  (14)  is  not  affected  by  these  operations, 
but  (13)  becomes 

(17)  exp[^-  Z  -  y  -  Z  j  2(r*j  , 

where  y^  and  y  now  stand  for  the  observed  values.  Thus  the  desired  posterior 
distribution  is  proportional  to  the  product  of  (14)  and  (17). 

This  is  for  <r*  a  known  constant.  If  cr*  is  a  random  variable,  the  factor  (12) 
must  also  be  reckoned  with.  The  integration  with  respect  to  p  changes  the  n 
in  (12)  to  n  —  1,  and  then  it  is  necessary  to  integrate  also  with  respect  to  <r*. 
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The  resulting  desired  posterior  distribution  for  the  is  prop^Drtional  to  the 
product  of  (14)  and 

(18)  [a*  +  £  (vi  -  y  -  T.  XirP, 

Note  that,  in  regard  to  the  sum  of  squares  appearing  in  (17)  and  (18),  we 
have: 

(19)  (yi-y  -  11  =  Z  (“i  -  E  . 

Let  (/3,*)  denote  a  set  of  values  for  that  minimizes  this  sum  of  squares.  If 
the  design  is  saturated  or  supersaturated,  the  minimized  sum  vanishes.  {8*)  is 
uniquely  determined  by  the  minimization  if  the  design  is  unsatiiratcd  or  saturated, 
i.e.  if  /  <  fj  —  1.  For  a  supersaturated  design,  we  shall  impose  below  a  further 
condition  which  will  determine  (3,*)  uniquely.  In  any  case,  the  sum  of  squares 
(19)  can  be  e.xpressed,  from  (7),  as 


The  first  half  of  this  expression,  the  minimized  sum  of  squares,  docs  not  involve 
the  8r,  and  so,  ignoring  it,  we  may  replace  (17),  in  the  posterior  distribution  for 
the  8r  when  is  known,  by 


The  corresponding  expression  for  (18),  in  the  posterior  distribution  for  the  8r 
when  v*  is  not  known,  is 

tl-(k+n-l)/2 

where  /F  =  a-  +  (»«  “  I'or  the  rest  of  this  paper,  we  shall 

consider  known  and  work  with  (21),  but  our  results  can  immediately  be 
translated  into  the  corresponding  expressions  derived  from  (22). 

Under  our  assumption  that  X  is  of  maximum  rank,  C  is  of  rank  min(n  —  1,/). 
Because  the  prior  distribution  (14)  involves  the  simple  equal-weighted  sum 
Y.r0r-,  the  posterior  distribution  will  be  easiest  to  make  computations  with  if 
the  experiment  has  been  so  designed  that  all  the  non-zero  roots  of  C  are  equal. 
The  common  numerical  value  of  these  roots  is  arbitrary,  because  all  the  entries 
in  X,  e.xcept  those  in  the  first  column,  or  equivalently  all  the  entries  in  \V, 
could  be  multiplied  by  a  common  non-zero  constant,  and  all  the  0,  and  b  divided 
by  that  constant,  and  the  problem  would  be  unchanged.  For  definiteness,  we 
shall  choose  the  common  value  of  the  roots  to  be  n.  This  permits  the  design 
conditions  below  to  be  expressed  as  succinctly  as  possible,  but  with  appiopriate 
slight  changes  in  the  formulas  any  other  scaling  could  be  used. 

Let  g  be  the  positive  number  defined  by 
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(23)  g*  »  E  0r*'. 

r 

We  shall  assume  that  /3,*  does  not  vanish  for  every  r,  and  he.ice  that  g  exists. 


FAe  case  /  <  n  —  1  {unsalt4 rated  or  saturated  design) 

The  rank  of  C  is  /.  If  all  the  roots  of  C  are  equal  to  n,  C  must  be  n  times  the 
identity  matrix.  We  then  have 

(24)  8*  =  E 

i 

and  Kg*  is  equal  to  the  sum  of  squares  for  factor  responses,  of  /degrees  of  free¬ 
dom,  in  the  ordinary  analysis  of  variance  of  the  observations.  The  implied 
condition  on  X  is  expressed  in  the  following 

DESIGN  CONDITION  (/  <  n  —  1).  The  columns  of  X  are  orthogonal,  and  the  sum 
of  squares  of  entries  in  aery  column  is  equal  to  n. 

By  an  orthogonal  transformation  we  can  transform  (j3,)  to  (7,),  where 

(25)  7l  =  E 

t 

and  the  posterior  distribution  for  the  7,  is  proportional  to 

exp(-K((7i  -  g)*  +  7t*  +  •  •  •  +  7/!/2ff*  in  dy, 

(26)  ' Jl  .  2  I  .  2  T  1 2x(l+/57i  ■  • 

(0  T  7i  +7:  +  .  .  •  4*  7/  ) 

This  distribution  obviously  has  circular  symmetry  about  the  7i-axis.  Let 

(27)  pi*  =  7i*  +  7»*  +  . .  .  +■  7/L 

Then  the  joint  (Xjsterior  distribution  for  71  and  pi  is  proportional  to 

/ooN  exp[-n((7i  -  g)*+  p\\/2o‘W~'‘dy^dpl 

(23)  - ,,'4  ,  'i\ — ^TTTjm - • 

\0  +  7i  ■r  Pi  ) 

(20)  and  (28)  is  our  primary  answer  to  the  inference  problem. 

For  the  particular  inference  problem  of  selecting  “interesting”  factors,  we 
need  to  evaluate  £(dr’).  Inverting  the  transformation,  we  can  express  tis  a 
linear  form  in  the  7,;  the  sum  of  squares  of  the  coefficients  is  equal  to  1,  and 
the  coefficients  of  71  is  0,*/g.  We  have  clearly 

(29)  £(7f7j  =  0  for  r  r, 

£'(7r)  =  £(71*)  =  ...=■•  £(7/)  =  £(p.’)/(/  -  1). 

Hence 


(30)  EW)  =  -f  (I  -  (^,Vg)*]£(pr)/(/  -  1). 

This  can  be  calculated  (for  every  value  of  r)  as  s'xin  as  £(71*)  and  £(pi*)  have 
been  evaluated,  ana  the  latter  may  be  found  by  numerical  double  integration 
over  the  region  —«>  <  7i  <  0  <  pi  <  “,  of  three  expressions  in  turn, 

namely  (28)  as  it  stands,  (28)  multiplied  by  71*,  and  (28)  multiplied  by  pi*.  The 
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first  of  these  is  required  in  order  to  determine  the  normalizing  factor  for  the 
distribution  (28). 


The  case  f  n  I  super  saturated  design) 

The  rank  of  C  is  n  —  1.  The  non-zero  roots  of  C  (  =  IV'iV)  are  equal  to  those 
of  WW',  and  if  there  are  n  ~  1  such  roots  equal  to  n,  WIV'  must  be  n  times 
the  identity  matrix.  Thus  the  rows  of  W  are  orthogonal  and  the  sum  of  squares 
of  entries  in  every  row  is  equal  to  n;  the  same  is  therefore  true  for  X.  A  set  of 
characteristic  vectors  for  C,  corresponding  to  the  non-zero  roots,  is  the  vectors 
whose  components  are  the  rows  of  W.  Let  (/!?,*)  be  the  unique  solution  of  the 
equations 


(31)  u,=  (j  =  2,  3, . . .  ,  n) 

T 

that  lies  in  the  linear  subspace  spanned  by  these  characteristic  vectors;  w'e 
refer  to  this  space  as  V.  The  general  solution  of  the  equations  is  then  equal  to 
(d,*)  plus  any  arbitrary  vector  orthogonal  to  V,  and  so  {$,*)  may  be  charac¬ 
terized  as  the  solution  of  (31)  that  minimizes  g.  We  see  easily  that  d*  is  given 
by  (24),  and 


(32)  Hg’  =  2  (y<  -  y)\ 

i 

There  is  an  orthogonal  /X/  matrix  P  —  {p,,),  transforming  (dr)  to  (7,), 
thus  7,  =  YLiprid$,  such  that  the  first  row  (pi,)  is  proportional  to  (d,*),  i.e., 
we  have  (25),  and  such  that  the  first  n  —  1  rows  of  P  are  linearly  dependent 
on  the  rows  of  IT,  i.e.,  they  span  V.  Then  the  posterior  distribution  for  the  7, 
is  proportional  to 

exp[-n{(7i  -  g)’  +  7j’  +  •  •  •  -f  7B-i)/2ff*  ]Y[  di, 

- (4* +  7, ‘  +  7.’  +  ..  - ■ 

Let 


(34)  P;’  =  7j*  +  7)’  +  •  •  .  +  7-1-1*.  Pj’  =  7b’  -r  7..fi’  +  .  .  .  +  7/’. 

Then  the  joint  posterior  distribution  for  71,  pi,  and  pi  is  proportional  to 


(35) 


exp[— n{7i  —  g)*  -f  Pi’!/2g'‘]pi~V2~"d7i<^Pi!^P; 


(33)  or  (35)  is  our  primary  answer  to  the  inference  problem. 

For  the  particular  decision  problem  of  selecting  "interesting”  factors,  we  need 
to  evaluate  Since  the  first  n  —  1  rows  of  P  can  be  described  as  an  ortho¬ 

gonal  transformation  of  the  rows  of  IT,  after  the  latter  rows  have  been  made 
into  unit  vectors  by  dividing  every  component  by  \/n,  we  see  that  the  sum  of 
squares  of  the  first  k  —  1  entries  in  any  column  of  P  is  equal  to  the  sum  of 
squares  of  entries  in  the  corresponding  column  of  IT,  divided  by  n.  Our  calcula¬ 
tions  v;ill  now  be  simplified  if  the  sum  of  squares  of  entries  in  each  column  of 


F.  /.  Anscotnbe 


731 


iV  is  the  same,  equal  therefore  to  rt{n  —  1)//.  Then  the  sum  cf  squares  of  the 
first  n  —  1  entries  in  any  column  of  P  is  equal  to  (n  —  1)//,  and  that  of  the 
remaining  entries  is  1  —  (»  —  1)//  ={/—«  +  1)//.  Since  clearly 

lF(7r7.)  =  0  for  r  s, 

(36)  =  £(7i’)  =  . . .  =  £(7*-i)  =  £(p.‘)/(n  -  2), 

<£(-y.’)  =  £(7.%»)  =  .  - .  =  £(7/)  =  £(p.’)/(/  -  n  +  1). 
we  obtain  finally 

.(37)  EM  =  03,Vg)’£(vr;  +  l(«  -  D//-  (^rVs)‘]£0>i*)/(n  -  2)+£(p1)//. 

This  can  be  calculated  (for  every  value  of  r)  as  soon  as  £(71-),  £0>i*)  and  £(pi’) 
have  been  evaluated,  and  these  may  be  found  by  numerical  triple  integration 
over  the  region  — o  <ri<a>,0<p;<ao,0<pi<o>,of  four  expressions 
in  turn,  namely  (35)  as  it  stands  and  also  (35)  multiplied  by  yr  or  by  pi’  or 
by  Pt*. 

Our  assumptions  concerning  X,  to  support  the  above  analysis,  are  expressed 
in  the  following 

DESIGN  CONDITION  (/  <  n).  The  TOWS  of  X  are  orthogonal,  the  sum  of  squares  of 
entries  in  each  row  is  equal  to  n,  and  the  sum  of  squares  of  entries  in  each  column 
other  than  the  first  is  equal  to  n{n  —  l)/f. 

The  effect  of  the  above  condition  can  be  expressed  geometrically  as  follows. 
With  only  «  —  1  degrees  of  freedom  available  for  estimating  the  vector  (fi,) 
in  a /-dimensional  linear  space,  unique  estimation  by  least  squares  can  be  carried 
out  only  in  a  (n  —  O-dimensional  subspace  V.  The  condition  implies  that  within 
y  estimation  is  isotropic,  i.e.,  has  a  sphericai  distribution  of  errors,  and  that  V 
itself  is  equally  inclined  to  each  of  the /co-ordinate  axes.  Thus  the  same  amount 
of  information  is  provided  about  each  response  coefficient. 

REM.-VRKS 

Construction  of  supersaturated  designs 

The  suggested  design  condition  for  a  supersaturated  experiment  can  be  satis¬ 
fied  for  any  given  /  and  n,  such  that  /  >  n  >  3.  A  suitable  W  matrix  may  be 
constructed  as  follows.  If  n  —  1  is  even,  (n  -•  l)/2  row  pairs  are  taken,  the  mth 
pair  consisting  of  a  row  whose  rih  member  is  ■>/[(2n/y)sin(2rmy//)]  and  a  row 
whose  rth  member  is  V[(2n//)co3(2rmr//)].  If  n  —  1  is  odd,  an  extra  single 
row  is  included  all  of  whose  members  are  y/{n/f). 

Hunter  [5]  has  pointed  out  that  the  most  likely  occasion  for  a  supersaturated 
experiment  is  the  early  part  of  a  sequential  programme  in  which,  with  /  held 
constant,  n  is  increased  by  steps,  from  a  starting  value  less  than  /-f-  1,  up  to 
/  •+■  I  and  higher  values  (unless  the  programme  is  abandoned  first).  It  can  be 
shown  that  the  above  construction  leads  to  a  sequence  of  supersaturated  designs, 
all  with  the  same  /,  and  with  n  increasing  by  steps  of  2,  such  that  each  successive 
X  matrix  can  be  derived  from  the  preceding  one  by  adding  two  further  rows  and 
then  rescaling  the  entries  in  accordance  with  our  convention,  so  that  they 


732 


Statistical  Methodology  I 

satisfy  (3)  and  have  the  desired  sum  of  squares.  The  corresponding  operation 
on  the  ir  matrix  consists  of  increasing  «  by  2  ever>Tvhere  and  adding  a  further 
row  pair  of  the  kind  specified  above.  The  transformation  matrix  T  must  be 
appropriately  modified. 

Speculaticns 

It  is  ver>’  likely  that  (30)  and  (37)  could  be  well  approximated  by  e.xpressions 
far  easier  to  calculate,  and  that  the  distributions  (2G)  and  (33)  could  sometimes 
be  adequately  replaced  by  normal  distributions. 

The  robustness  of  the  above  analysis  to  non-normality  of  the  population  of  jS’s 
has  not  been  investigated,  but  might  be  expected  to  be  fairly  good. 

Anyone  interested  in  the  method  of  "confidence  intervals"  could  no  doubt 
give  an  analogous  treatment  of  supersaturated  experiments,  based  on  the  ran¬ 
domization  and  the  design  condition  suggested  above. 


SOMMAIHE 

Si  !’on  propose  unc  expression  mathetnatique  pour  la  repartition  des  probabilit^s  initiales 
relatives  a  quclqucs  parametres,  il  faut  faire  attention  aux  cons^uences  non  voulues.  Quand 
il  y  a  bcaueoup  dc  parametres.  il  peut  arriver  qu'une  repurtition  du  genre  nomme  •  conju- 
guee  »  par  Raina  et  Schlaifcr  soit  peu  satisfaisanie,  parce  qu’cilc  implique  certaines  proprictes 
des  observations  cn  vertu  de  la  loi  des  grands  nombres,  lesqueDcs  ne  sc  verifient  pas  en  elTcf. 

Comme  exemple  d'un  probleme  a  beaucoup  de  parametres,  je  considerc  I’analj’sc  statis- 
bque  des  observations  faites  dans  une  experience  factorielle,  d’apres  une  formulation  due  a 
MM.  E.  M.  L.  Beale  et  C.  L.  .Mallows.  Jc  considere  en  particulier  I’analyse  d’une  experience 
«  sursaturec  »,  dans  laquellc  le  nombre  des  parametres  a  estimer  d^passe  Ic  nombre  dss 
observations.  Je  montre  que  I’analysc  devient  la  plus  facile  quand  le  plan  de  rcxperience 
satisfait  a  une  certainc  condition.  Cette  condition  peut  s’eflectucr  pour  un  nombre  quelconque 
de  facteurs  et  un  nombre  quelconque  d  observations. 
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